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Abstract 

Unless there is evidence for fractal scaling with a single exponent over distances .1 < r < 
100/i~ 1 Mpc then the widely accepted notion of scale invariance of the correlation integral 
for .1 < r < 10fo -1 Mpc must be questioned. The attempt to extract a scaling exponent 
v from the correlation integral n(r) by plotting log(n(r)) vs. log(r) is unreliable unless 
the underlying point set is approximately monofractal. The extraction of a spectrum of 
generalized dimensions v q from a plot of the correlation integral generating function G n (q) 
by a similar procedure is probably an indication that G n (q) does not scale at all. We explain 
these assertions after defining the term multifractal, mutually-inconsistent definitions having 
been confused together in the cosmology literature. Part of this confusion is traced to a 
misleading speculation made earlier in the dynamical systems theory literature, while other 
errors follow from confusing together entirely different definitions of "multifractal" from two 
different schools of thought. Most important are serious errors in data analysis that follow 
from taking for granted a largest term approximation that is inevitably advertised in the 
literature on both fractals and dynamical systems theory. 



1 Introduction 

Knowlege of the three-dimensional distribution of matter in the universe at r > 150/i _1 Mpc is 
limited. We do not know if the matter distribution over scales r ^> 150/i _1 Mpc is homogeneous or 
isotropic (background radiation, self-consistency of the standard model based on the assumption 
of a stable uniform density, etc. do not provide direct evidence about the distribution of visible 
matter in the present epoch). For r < 150/i _1 Mpc the distribution of visible matter is clearly 
inhomogeneous, with large voids and clustering, and various analyses have produced results that 
are equivalent to claiming scale invariance for the correlation integral, that n(r) w r u , with one 
school (@, 0) reporting that v w 1.23 for .1 < r < lO/i^Mpc, whereas the other §]) 
reports that v w 2 for 2 < r < 150/i _1 Mpc. The correlation integral and scaling exponent v are 
defined in part ||. Roughly speaking, one can think of the scaling exponent v as a correlation 
dimension, but not as Hausdorff or box-counting dimension. We will explain why the reported 
claims of scale invariance may be spurious unless the matter distribution actually is, to a very 
good approximation, monofractal. The first aim of this paper is to explain the need for a more 
careful analysis of observational data than has heretofore been performed. The second aim is to 
provide that analysis (part Isf) . The third is to explain and eliminate the confusion over the term 
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Figure 1: Log-log plot of a function f(r) = 20 r 71 + 0.1 r 72 (stars), with 71 = 1 and 72 = 
together with the function 18 r~ 73 (dashed line) with 73 = 0.9. 



multifractal. Finally we will explain why a "nonanalytic" density does not rule out the use of 
differential equations. 

In any attempt to extract scaling exponents from log-log plots of correlation or generating 
functions a conservative criterion in both critical phenomena and dynamical systems theory is 
that linearity should be exhibited over at least three decades, which would require data out to at 
least r = 1000 /i _1 Mpc in astronomy. The reason for this is that there are too many different 
functions / (r) that don't scale with r, f(Xr) ^ A"/(r), but log(/(r)) vs. log(r) may nevertheless 
appear to have a constant slope over a short enough range of r. The function f{r) = c\r a + C2r b 
provides a relevant example: this function is not scale invariant because of two exponents a and 
b, but it is easy to exhibit the illusion of scale invariance by plotting log(/(r)) vs. log(r) over 
only two decades (see figure ^) . If one questions the controversial claim of scale invariance over 
two decades up to r w 150/i _1 Mpc, then must also one question the widely accepted claim of scale 
invariance, also over only two decades, up to r 10/i -1 Mpc. 

To make our viewpoint clear to the reader we recapitulate briefly the controversy over scaling 
in this field. The earliest attempts to extract v were based upon the expectation of homogeneity 
at larger scales, with inhomogeneities confined to r < where ro is a correlation length. By 
confusing an amplitude with a correlation (i.e. characteristic) length fllTf ro ~ 5/i~ 1 Mpc was 
obtained, which is inconsistent with the observed clustering and voids out to r k 150/i _1 Mpc, 
and beyond. Coleman and Pietronero jl3) and Martinez et al. |i7j have argued instead that the 
known data are scale invariant, and correspondingly that there is no correlation length. Early 
attempts to dismiss this point of view as the result of "large deviations" due to "unfair samples" 
failed when it became clear that the apparent scale invariance is the rule and not the exception. 
As a th order description this seems to be correct, but we show in part || that a refinement of 
the method of |t3| (see part ^) leads to the conclusion that the data are inadequate to draw a 
conclusion for or against multifractal scaling, although it is clear that simple fractal scaling, with 
a single exponent v as claimed by the Pietronero school, is not indicated by the data. Fractal or 
not, it is clear that there is, as yet, no evidence from galaxy redshift surveys of any crossover to 
homogeneity. 
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2 Why should anything scale? 



This is a good question because, as one expert in statistical mechanics and nonlinear dynamics 
put it, if you can't calculate anything then you can still talk about scaling. Furthermore, most 
phenomena in nature do not scale, or at least have not been shown to scale (the polls are not yet 
closed on the question of multifractal scaling in the inertial ((!]]; fL2||) and dissipation & p?| ) 
ranges of fluid turbulence, and the early returns are not entirely convincing). In fact, there are 
only a few known reasons why anything should scale, aside from dimensional analysis (Reynolds 
number scaling, which works pretty well in fluid mechanics so long as one sticks to qualitative 
considerations and does not look hard at the numbers). Let us enumerate the (known) ways. 

First, there is scaling of all sorts of correlation functions and other thermodynamic quantities 
if you are close enough (within (T — T c )/T c of 10~ 3 , at least) to a second order phase transi- 
tion. The problem with this is first that the universe is not in thermodynamic equilibrium (tde): 
nonuniformities like spiral galaxies and DNA are not generated systematically at small scales in a 
system in tde. Second, there is no reason why the universe should be tuned precisely to a critical 
point (where T w T c ). Critical phenomena are popular because the scaling indices are universal, 
depending only on symmetry and dimension, which allows a theorist to forget about worse details 
than those that plague experimentalists and calculate the scaling indices of real ferromagnets, for 
example, by using Ising models or 4> A models ((32)). 

We can also consider dynamical critical phenomena, where universality classes can still be 
defined for scaling exponents based only on symmetry and dimension, but that doesn't help: we 
are still restricted to systems that have only very small deviations from thermal equilibrium. Large 
excursions from tde aren't allowed at small scales in these systems. 

Galaxies have been modelled on the basis of critical phenomena far from tde by using a par- 
ticular cellular automaton ( J76[) near the percolation threshhold. Nice patterns can be produced 
that look like spiral galaxies (|38| ) , but who tunes the galactic system to stay near the percolation 
threshold? This model doesn't yet have enough physics in it to be falsifiable. 

For those who believe that scaling is ubiquitous in nature, but don't expect that Mother Nature 
tunes phenomena to a critical point, there is SOC (self-organized criticality). The idea of SOC 
(0) is based on driven dissipative dynamical systems far from equilibrium that quite naturally lie 
at a borderline of chaos, for a large range of parameter values, and therefore require no parameter- 
tuning. Criticality (a borderline of chaos) means that all Liapunov exponents must vanish (some 
positive exponents are usually allowed in the literature because models where all of them vanish 
for a finite range of control parameter values ( []l8|| ) are unknown). Scaling exponents in SOC 
are argued to be universal because they are expected to be parameter-independent. The main 
problem with SOC is that no one has yet found an example of a dynamical system where the idea 
has been realized, criticality without parameter tuning, criticality that persists while parameters 
are varied. The SOC idea is usually illustrated by a sandpile model that has no tunable parameters 
because the parameters in the model were implicitly tuned to criticality and then forgotten. SOC 
purports to provide a universal explanation of fractal scaling indices which, if we follow the scaling 
enthusiasts, should be ubiquitous in nature, but fractal and multifractal exponents are not universal 
and can't be used to define universality classes. The different models used to try to describe the 
(still inadequate) data on the inertial and dissipation ranges of fluid turbulence provide examples 
of this (|5l[] & |Q). Sandpile models of SOC reproduce certain qualitative features of block spring 
models of earthquakes, but the block spring models do not produce the parameter-independent 
criticality demanded by SOC (Jf5|). SOC has not been defined unambiguously because universality 
classes for SOC have not been defined. In the absence of universality classes one cannot claim 
that a simple automaton like the sandpile model represents a complicated or complex dynamical 
system that occurs in nature (p5[|). 

Then, there are the fractals that are generated in the phase spaces of critical and chaotic 
driven-dissipative dynamical systems far from thermal equilibrium. The scaling indices that de- 
scribe the fractals that occur in chaos are not universal. That's ok, because while the fractal 
dimensions are parameter-dependent the fractals persist (in distorted form) as the parameters are 
varied over relatively wide ranges (this is usually what happens in "SOC" models too). Given 



3 



the coarsegrained fractal support generated by a driven-dissipative dynamical system ( "support" 
of a distribution is defined in part 3.5), nonuniform distributions on that support are typically 
multifractal, meaning that the coarsegrained density becomes more and more spiky (and perhaps 
also intermittent with voids) as the distribution is resolved at finer and finer scales of observation. 
The corresponding densities would be nearly everywhere nondifferentiable if the mathematicians' 
fiction of an infinite-precision limit were not ruled out empirically. 

Conservative dynamical systems (like gravity without dissipation/driving) cannot generate 
fractals in phase space: the support of any distribution generated by a conservative dynamical 
system is space-filling (Liouville's theorem). Space-filling means that the support has the dimen- 
sion of the phase space, whereas a fractal support has a nonintegral dimension less than that of 
the phase space. However, a conservative dynamical system far from thermal equilibrium can 
also generate multifractal coarsegrained distributions on the space-filling support. A noninteger 
correlation integral scaling exponent v does not suggest that the galaxy distribution has a fractal 
support: nonintegral v is consistent with multifractal distributions on space-filling supports. 

The problem with all of this is that it does not explain anything, as yet: the fractals and mul- 
tifractals discussed above all occur in the very high dimensional phase space of all of the galaxies 
(each galaxy is treated as a point particle here and below), and we do not know how those distri- 
butions would look when projected onto the three dimensional space of observation in astronomy. 
In other words, we don't have a quantitative explanation for where fractal (including multifractal) 
galaxy clusters should come from. In practice, it makes more sense to consider hydrodynamic 
models of galaxy formation and clustering. Hydrodynamics demands a coarsegrained description 
of the density. Coarsegrained descriptions are precisely what are provided by the multifractal 
formalism (see Vergassola et al. [(75) and references therein for a hydrodynamic approach to clus- 
tering and voids) . Unable at this time to contribute to the dynamical theory of galaxy formation, 
let us forget temporarily that no theorist can yet convincingly explain the origin of fractal galaxy 
clustering, if it exists, and turn instead to the question how astronomical data should be ana- 
lyzed in order to decide the much easier question whether fractal clustering is indicated by the 
observational data. 



3 Coarsegraining and fractals 

3.1 Clustering, voids, and efficient partitions 

In what follows we consider only finitely many data points N in some space, the observational 
data. To present the ideas in the clearest possible way we assume that the space is one-dimensional 
(excepting sectio n fi| on the correlation integral, where the dimension of space is irrelevant). The 



ideas of sections 3.1-3.3 are admittedly heuristic and can only be made rigorous by the use of 
generating partitions found in the phase space of certain nonintegrable dynamical systems ([^6|). 
In particular the heuristic description is limited to one dimension (generating partitions are not so 
limited), but our one dimensional treatment is adequate to the purpose of resolving the prevailing 
confusion over "multifractals" and "nonanalytic" densities. 

We assume that the N data points in our one dimensional space are confined to the unit 
interval, because any finite interval can be converted into the unit interval by rescaling. With 
lengths denoted by I, < I < 1 in all that follows (planar and three dimensional cosmological data 
are also assumed to be rescaled so that < r < 1 in the discussion of part [|) . 

Coarsegraining of the data set requires only that we cover the N points by N n nonoverlapping 
intervals of size l^ n \ Clearly, we can take N n = 2™ intervals of size = 2~ n , N n — 10" intervals 
of size 

/(«) = 10-™, and so on. All that is required to avoid overlap is Z' 1 ' < 1/2; coarsegraining 
per se is not unique. The only other requirement, so far, is that we must choose the intervals so 
that l^ > Z m i n where Z m i n is the smallest distance between two points in the sample (a single 
point cannot be coarsegrained, so that an interval containing only one point is meaningless). We 
shall see that the desire of the theorist to approximate l m i n by e, where e goes to zero, has led to 
serious errors in data analysis. In our analysis l m i n is always finite because it contains at least two 
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Figure 2: Nonuniform clustering with uniform partitionings. Here is no way to construct an 
efficient, uniform coarsgraining l^ 2 \ 



real data points. 

In any efficient coarsegraining the intervals that cover the points must be nonoverlapping, 
Coarsegrainings that are space- filling (meaning that N n l^ = 1 ) satisfy this criterion but do not 
separate voids from clusters. For data with voids we should construct a coarsegraining that is 
not space-filling, one where Nil^ < 1, in order to cover all of the clusters while excluding the 
largest voids. Such a coarsegraining is more efficient than an arbitrary one. With the desire for 
efficiency in mind the idea is first to remove all of the largest voids. Then we choose the interval 
size ss (1 — "Y^vi^/Ni, where the intervals v\^, all of roughly the same characteristic size, 
represent the Mi largest voids in the sample, and N\ = Mi + I. The Ni first generation intervals 
required to cover the N data points are not space-filling because Nil^ < 1, by construction. If, 
beneath the Ni intervals now covering the data, there are still voids and clustering then we can 
continue systematically by removing all voids of next largest characteristic size: in the second 
generation of coarsegraining simply choose N$ intervals of size i' 2 ' ss (1 — J2 v 2,i)/N2 where the 
intervals V2,i represent the sizes of Mi largest voids covered by the N\ intervals This iterative 
procedure may be continued so long as we can distinguish clusters from voids. There are two ways 
that it can terminate. Either we reach a scale > l m in where the points are relatively evenly 
spaced over those intervals (so that there is no longer a distinction between clusters and voids), 
or else clustering continues all the way down to the finite limit l^ n ' — Z m i n , where ^ m i n roughly 
characterizes an interparticle spacing and will be defined more precisely in part B.3. 

The procedure outlined above describes the idea of a more efficient partitioning than a space- 
filling one, a more efficient coarsegraining of the data set because the largest voids are systemati- 
cally excluded, generation by generation in n. We do not have only one partition but a hierarchy 
ri = 1, 2, ... , n max of partitions, each with interval sizes l^ n > . 

A pencil and paper sketch of about sixteen points with two big voids of roughly the same 
size, but with three or more very nonuniform first generation clusters, e.g. figure |2|, shows that 
the method described above will produce an efficient partition only if the clustering is relatively 
uniform, only if all clusters in a each generation are all of about the same size. When this is 
not the case, when the clustering is very nonuniform (as in figure |J) , then the procedure outlined 
above will not produce an efficient partition and may even fail to cover the set. In that case we 
could try to repair the misfit by taking fW to be the largest of the intervals in the nth generation, 
but this may produce overlapping intervals, N n l^ > I, which is intolerable. 

We explain in the next section how "convergence problems" arise in the analysis of empirical 
data whenever arbitrary (rather than efficient) partitions are used. 

The motivation for the expectation that an optimal partition may exist, at least in some cases, 
is as follows: certain nonintegrable dynamical systems coarsegrain phase space uniquely ( [pl[ , |fl6f ) . 
In those systems the optimal partition is generated by the dynamics and is called the "generating 
partition" . As a simple example, the invariant set of the ternary tent map is the middle thirds 
Cantor set ( [p3| ). The generating partition of the ternary tent map (obtained by n backward 
iterations of the unit interval using that map) is given by N n = 2™ intervals l^ — 3 _n , for n = 
1,2,.... It is impossible to construct a more efficient coarsegraining of the middle thirds Cantor 
set than this one. The voids are the excluded open intervals (1/3,2/3), (1/9,2/9), (7/9,8/9), and 
so on (initial conditions of the ternary tent map that lie in the voids iterate to minus infinity). 
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Figure 3: On top we show the optimal partitioning for the middle-thirds Cantor set, on the 
bottom a very inefficient uniform partitioning. 



3.2 Scale invariant clustering 

We restrict our considerations in this section to relatively uniform clustering (the clusters in a 
generation n of coarsegraining are all of about the same size l^ n \ and the nth generation voids 
are also all of about the same size v n ). Invariant quantities (scaling exponents) can only be 
constructed, if at all, from N n and as the generation n of coarsegraining is increased, as we 
look at the data set with finer and finer, but never with pointwise resolution (the smallest interval 
size Imin always contains at least two data points). 

In what follows it is conceptually useful to think of the hierarchy of intervals for 11 = 1, 2, . . . , n 
as sitting on the branches of a tree of some order t. There are N\ branches in generation 1, N2 in 
generation 2, and so on, and the tree need not be complete. Scale invariance will be seen below 
to require that N n increases exponentially, N n w t™, as decreases exponentially, l( n > w a~ n . 
Complete trees have N n = t n branches in generation n with t > 2 an integer, while incomplete 
ones have noninteger t, in which case the order of the tree is the next integer larger than t. The 
middle thirds Cantor set, an idealized model of uniform clustering exhibiting big voids on all 
scales, defines a complete binary tree (t = 2). 

We are only treating coarsegrained versions of N points, so the simplest kin d of scal e inv ariance 
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is geometric (and so is the more complicated kind, as we shall see in parts "$A and [3^ below). 
A fine-grained picture of the subset of any branch for n > 2 looks, upon magnification, like the 
entire tree. This kind of scale invariance ([^2|) is expressed by the exponent Dq in 



(iCO)**!. (1) 



In other words, Dq rj ln(t)/ln(a) is an exponent that reflects self-similarity of a hierarchy of 
relatively uniform clusters (clusters within clusters within . . . ) . 

In order to check whether ([!]) holds approximately for a given (very uniform) data set it is 
very useful first to find as efficient a partition as possible. The practical difference (emphasizing 
the famous "convergence" problem) between the use of efficient and inefficient partitions is best 
illustrated by an idealized example. 

As an example of an optimal partition consider a very artificial data set constructed as follows: 
arbitrarily choose a finite number TV of points generated by ternary expansions of the form x = 
.ei . . . ejv . . . with €i — or 2 (e» = 1 is excluded). These numbers belong to the middle-thirds 
Cantor set (all ternary numbers of this kind define the middle-thirds Cantor set pl| ) . Terminating 
strings ei...ejv00000... define N n = 2™ intervals = 3"" given by [0,1/3] and [2/3,1] in 
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Figure 4: An efficient (nonuniform) partition for the point set shown in Figure ^|. 

generation one (n = 1), [0, 1/9], [2/9, 1/3], [2/3,7/9], and [8/9, 1] for n = 2, and so on. Rational 
numbers of the form x = .t\t2 ■ ■ ■ £n (periodic expansions like x = .020202 . . . ) and irrational ones 
x = .ex . . . ejv • • ■ (where the digit string is nonperiodic) are also covered by all of the N n intervals 
as well, so that the covering provided by those N n intervals l^ n > = 3~™ is optimal, generation by 
generation. The scaling law (Q) yields Do = In 2/ In 3. This example illustrates a "monofractal" 
because the optimal covering is uniform (all N n of the optimal intervals in one generation have 
the same size l^). 

In contrast, we can try to estimate Dq by using the uniform nonoptimal covering = 2~ n , 
a space-filling partition given by [0, 1/2] and [1/2, 1] for n = 1, [0, 1/4], [1/4, 1/2], [1/2, 3/4], and 
[3/4,1] for n = 2, etc., that ignores the voids alltogether. Here, with N points in the data set 
N 3> N n 3> 1 must be very large before we can expect to observe scaling with an exponent close 
to -Do = In 2/ In 3: for N = 16 points, e.g., and using N n « l~ D then from n = 1 and 2 one gets 
D = 1, n = 3 yields D — .86, and further attempts to extract Dq are impossible unless the number 
N of data points is increased. This illustrates why, in practice only relatively efficient partitions 
are of interest. We return next to the search for the optimal partition of a typically nonuniform 
empirical data set of N points. 



3.3 The optimal partition of an empirical data set 

For a typical empirical data set of N points the most efficient partition that one can construct will 
rarely be uniform. Let l^ n > denote the size of the largest cluster in generation n after deleting the 
largest voids, as in part 3.1. This may yield an overlapping partition of N n uniform intervals 



where N n l^ > 1, but we can immediately improve upon that coarsegraining: in any generation 
n the N n intervals so-constructed will (excepting the largest interval, which determines l^) not 
end on data points but will extend beyond them. To make the covering efficient simply shrink 
each interval until it ends on the nearest two points of the cluster that it was intended to cover 
in the first place. The result is that the number of intervals is exactly the same as before, but we 
now have N n nonuniform intervals obeying both l\ + . . . + ljsf n < 1 and l\ + . . . ljsr n < N n l^ n '. In 
other words, we have minimized the sum h + ■ ■ ■ + In„ while holding N n fixed. It is hard to see 
how a more efficient covering can be found, so for the purpose of this paper we call the hierarchy 
of intervals, so-constructed, the optimal partition (see Figure ||). 

Our definition of optimal partition is a finite precision realization of an "optimal" (^-covering (a 
(5-covering approximates the "infimum" ), as is used in the mathematicians' definition of "Hausdorff 
measure" ( [l9|] ). The method in the cosmology literature that may come nearest to ours, in spirit, 
is the minimal spanning tree method (j74j). In dynamical systems theory the optimal partition 
is called the generating partition and provides the geometric or finite precision, definition of a 
fractal. 

Here's an idealized example of a dynamical system with a nonuniform optimal partition (p3[). 
The generating partition of the asymmetric tent map with slope magnitudes a and b is given by 
N n — 2" intervals l m — a~ m b~( n ~ m " > where m = 0, 1, 2, . . . , n, and describes the two-scale Cantor 
set (there are two first generation scales l\ — a" 1 and I2 = b^ 1 ). The idealized data set consists 
of interval end points and also of limits of infinite sequences of interval end points (the latter 
corresponding to infinitely-many backward iterations of the unit interval by the asymmetric tent 
map). This description of the asymmetric tent map is correct if a~ 1 + b^ 1 < 1 (a -1 + b^ 1 = 1 
means space-filling, while a -1 + < 1 produces voids and clustering, representing an idealized 
nonuniform "Cantor dust" (0]). One object of Jl6| was to extract the generating partition of the 
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Henon map. 

The generalization of the scaling law ([!]) to our hierarchy of nonuniform optimal partitions {/j} 

is 

N N 

E'f H «l. (2) 

»=1 

where the scaling index Dh is called the Hausdorff dimension (JlS|1), Whether a data set has a 
Hausdorff dimension can only be answered empirically, by constructing the optimal partition and 
checking to see whether (Q) holds over many different generations n with the same exponent Djj . 

The Hausdorff dimension of the two-scale Cantor set is given exactly in the first generation by 
a~ Dn + b~ Dn = 1. In the empirical case, in contrast, N 3> N n ^ 1 is usually required in order that 
we have enough data points to see scaling via either ([j]) or (g) which, as Mandelbrot pointed 
out early in his book, is usually confined to an intermediate range of interval sizes iJ 71 ^ where 
! mal ;» 3> ^min- Here, Z max is on the order of the size of the sample, l^ n ' is the largest of the 
N n nth generation intervals {k}, and l m i n is in our case the smallest distance between two points 
in the sample. In general, we should not expect to observe scaling unless the largest intervals are 
much smaller than the size of the system, and unless all of the smallest ones contain more than a 
single data point. However, the larger limit may not be applicable to present astronomical data 
because the systems of galactic clusters and voids are presumably much larger than any available 
sample size. Also, there is nothing to prevent our checking for scaling all the way down to Z m i n - 
Again, if an inefficient partition is chosen then scaling may not be observed even if the data set is 
fractal, because the number n of generations needed for "convergence" to a scaling exponent D 
or Dh may exceed the number N of data points in the sample. 

All definitions and approximations based upon the -> limit are systematically avoided 
because, as we explain below, they lead to formulae that generally do not apply to finite data sets. 

Suppose that we have found the optimal nonuniform partition for a data set. If we replace the 
uneven intervals in a nonuniform partition {U} by the largest scale of each generation (simply call 
it Z(")), then we obtain a less optimal uniform covering defined by 

iV„(zW)°°«l. (3) 

Because N n is the same as in (|^), and because l^ > U, it follows that Do > Dh where Do is 
called the box-counting dimension. This procedure defines the most efficient uniform partitioning 
of the data set if the clustering is relatively uniform. In other words, whether or not Do provides 
a good estimate of Dh for low values of n <C N depends on whether or not the uniform partition 
closely approximates the nonuniform optimal one. This replacement amounts to the pointwise 
approximation of a nonuniform fractal data set by a monofractal. The resulting idealized data set 
can be thought of as consisting of the end points of the uniform intervals l^ n \ For the two-scale 
Cantor set with a < b the box-counting dimension is D = In 2/ In a. In the two-scale Cantor set 
D = ln2/lna and a~ Dli + b~° H = 1 yield Do ~ Dh only if a and b are approximately equal. 

How many generations are necessary in order to convince hardened sceptics that scaling has 
been observed? In both critical phenomena and dynamical systems theory the rule of thumb is 
three decades on a log-log plot, requiring astronomical data, e.g., from .1 to 100 /i _1 Mpc, or from 
1 to 1000 /i -1 Mpc depending on the smallest distance reported in a given catalog of galaxies. With 
Z 1 ™) « a~ n , a > 2, we would need n « 31nl0/lna generations. We call this criterion "the Geilo 
Criterion" because it was suggested at a Geilo NATO-ASI. The Geilo criterion is not a matter of 
taste: it is advocated in order to deflect erroneous reports of scaling like that indicated in figure 
[j] over only two decades, 1 < r < 100. 

Thinking of the optimal partition of a fractal as organized onto a tree of some order, if we 
write N n — t n then the order of the tree is the next integer greater than or equal to t. If t is 
nonintegral then the tree is incomplete. If l^ = a~ n then t — a D ". With a > 2 the /3-model of 
fluid turbulence lies on an incomplete tree that is at least octal See |ll[ for a discussion of 

the /3-model in cosmology. 
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3.4 Multifractal scaling (via a nonuniform optimal partition) 

A data set that obeys scaling (Q) with a nonuniform optimal partition may be "multifractal" . 
Multifractal is always defined here to mean a spectrum of fractal dimensions (|}(|). Each dimension 
in a multifractal spectrum describes the scaling of a subset of the optimal partition (a nonuniform 
fractal is decomposed disjointly into a union of other fractals). 

Given the optimal partition a multifractal spectrum D(X) may be defined by parameterizing the 
hierarchy of coarsegrained intervals (the parameter here is A) so that the partition is organized into 
(nonoverlapping but) interwoven sub-partitions (p^|)- Each fractal dimension in the multifractal 
spectrum is the Hausdorff dimension of one subset of the partition (a multifractal is always the 
union of a complete set of nonoverlapping but interwoven fractals labeled systematically by some 
index). To obtain scale invariance the interval sizes l n must contract exponentially as n increases. 
Both the nth generation intervals and their contraction rates are generally nonuniform: as an 
oversimplified example let h — a^ n denote the ith of N n intervals in generation n. This defines 
a simple nonuniform Cantor set based on N\ different first generation scales h = a^ 1 = e~ Ai if 
li + . . . + In 1 < 1 (in a chaotic dynamical system the contraction rate Vf'' w e~~ nXi describes the 
intervals of the generating partition only asymptotically and approximately for b>1, representing 
the inverse butterfly effect for integration backward in time along unstable manifolds (p3[)). 

Suppose that there are N{X) intervals with the same contraction exponent A (in a dynamical 
system \ is the Liapunov exponent for forward evolution in time, starting from a specific class of 
initial conditions, namely, all initial conditions that yield the same Liapunov exponent Xi). Then 
N{\) = l(\)- D{ -^ (where l(X) = c- nX ) defines the Hausdorff dimension D{\) of the subset of the 
partition labeled by A. We can generalize (^|) by writing down the generating function (p5|) 

N„ 

z n ((3) =£*? = £ mm « £ eS{x) ^ x ( 4 ) 

i=l A A 

with N(l) w e ns ^ for large enough n, and where Z„(Dh) ~ 1 defines the Hausdorff dimension 
of the entire fractal (by @). Note that Dh > D(X) because N(X) < N n . This simple fractal, 
seen as multifractal, is the union of a complete set of interwoven, nonoverlapping monofractals 
(neighboring branches on each generation of the tree generally are labeled by different indices A) . 

In the ovesimplified model above we have Z n {j3) = (^2 . 

As an example (|53|), consider the two scale Cantor set (with N% = 2 first generation intervals 
li = a -1 > Z 2 = b^ 1 ). In generation n the optimal covering is given by the N n = 2" intervals 
with sizes Z™^ - " 1 , of which N m — n\jm\(n — m)\ have the same size l m = l™V? l ~ m . Here, A = 
xhia + (1 — x) ln& with x = m/n = 0, 1/n, . . . , 1. In each generation n there are n+ 1 points in 
the multifractal spectrum, not more, and the number of points grows only linearly with n (still, 
this eliminates bi- fractals, tri-fractals, . . . , and A^-fractals from our definition of "multifractal"). 
Using Stirling's approximation, so that, with x = m/n, N(X) w e™ s ( A ) where 

s(A) ~ —x lax — (1 — x) ln(l — a;) (5) 

is the Boltzmann entropy (divided by n) of all intervals with the same contraction exponent A, 
we have D(X) = s(A)/A, which shows the connection of fractal dimension to entropy (s(A) = In 2 
and A = ln3 for the middle thirds Cantor set). Since t(X) — e s ( A ) the tree is generally binary but 
incomplete for any monofractal subset labeled by the contraction index A in the two-scale Cantor 
set. 

Note that "multifractal" is consistent with Dh = 1: the support may be space-filling, while 
subsets of the support are fractal (0 < D(X) < 1). In the two-scale Cantor set Dh = 1 corresponds 
to the space-filling condition a -1 + b^ 1 = 1. 

Ideas based on the generating function (^) have been used to analyze experiments on the 
transition to chaos in fluid dynamics (p6|). 

The notion that multifractal scaling can be verified for small data sets (p8[) is a misconception: 
fewer data points N, with less precision, cannot be required for the determination of a spectrum 
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of fractal dimensions than are required for determining one dimension, say Dh- The error in 
the claim follows from confusion and errors made in defining "multifractal" (see parts |], ^ and ^ 
below). 

The term multifractal has occasionally been incorrectly defined in the cosmology literature 
where at least three entirely different generating functions are confused together (see parts ^| and 

as if their corresponding scaling exponents (whenever scaling exists) would define universality 
classes independently of probability distributions and partitionings used to define those exponents. 
In Jl3| a far to restrictive idea of multifractal is presented in part 6. Multifractal spectra were 
introduced in analogy with critical exponents (IpOt), but the expectation of universality (|39)) was 
not realized (a restricted and still unproven universality is merely postulated for vortex cascades 
in fluid turbulence (|24|)). In the theory of chaotic dynamical systems two examples of topologic 
invariants are the tree order t and its degree of incompleteness ( p8| , [ |53| ) . 

We always restrict our formulation of the requirement for fractal and multifractal scaling to 
finite and to finitely many data points N, completely avoiding mathematically idealized results 
that would require for their applicability the empirically and computationally unattainable limit 
where lW> -> 0. We shall see in part p.6| that this rules out largest term approximations, whose 
validity would require values of l^ that are too much small (l^ n ' <C 1, with the range of l^ n > 
extending over at least three decades) to be consistent with the analysis of galaxy distributions. 



3.5 The empirical distribution 

For an observational data set there is only one pointwise probability distribution, the empirical 
distribution P(x) defined by the N data points: P(x) is simply the fraction of points lying to the 
left of (and including) x, so that P(0) = 1/N and P(l) = 1, by construction. The distribution is 
constant on the voids and increases discontinuously at each data point, so that the plot of P(x) 
is a staircase of N — 2 steps of finite width. The data staircase has the singular pointwise density 
p(x) = P'(x) given by 

d N 

dP(x) = —^8{x- Xi ) (6) 

Of course, @ is only a theorist's fiction: it neglects the error bars in the locations of positions. 
In reality each position is specified empirically by a finite interval whose width is the uncertainty 
in location. We assume here that these uncertainties are very small relative to the smallest 
separation Z m ; n between data points. Otherwise coarsegraining and fractal/multifractal analysis 
are impossible. The density (^) will be used to correct a more serious theoretical error in part ^ 
below. 

All that we need in what follows is the staircase P(x) along with an empirical technique 
for characterizing voids and clusters. We emphasize that attempts to "smooth" the staircase 
(via "splines", e,g) will discard important information about clustering. No pointwise probability 
distribution other than the staircase P(x), 

1 N 

P(x) = -J2 @ ( x - X i) (7) 

i=l 

is relevant for empirical data analysis. 

An arbitrary distribution P(x) of N data points is generally not approximablc in cither the 
continuum (infinite precision) or hydrodynamic (coarsegrained) limit by a differentiable distribu- 
tion. We shall find next that the coarsegrained versions of the empirical distribution P(x) are 
typically too spiky (too intermittent) to be approximable by an everywhere differentiable distri- 
bution (reminding us more of "noise" than of analyticity) , even if Dh is an integer (even if the 
optimal partition is space-filling). The spikiness/intermittence represent clustering and voids in 
the sample. Hydrodynamics demands a coarsegrained description of a pointwise distribution, and 
we will discuss in part @ how the required densities can be defined. 
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Figure 5: The idealized empirical distribution according to eq. B with N = 3. 



On an optimal or at least efficient partition a coarsegrained probability Pi is defined by the 
difference Pi — AP(x) over the ith closed interval of size l^ n \ and is just the fraction Pi = rii/N 
of the total number of data points rii in the ith interval including the end points. While each 
empirical distribution P{x) is a staircase of finitely-many steps, each coarsegrained distribution 
{Pi} is a histogram on a finite support. 

The optimal partition optimally defines the "support" of the hierarchy of coarsegrained em- 
pirical distributions {Pi}: for the optimal partition, each interval end point coincides with a point 
where the staircase function P(x) increases discontinuously. Whether an empirically-constructed 
optimal partition is the generating partition of a deterministic dynamical system is a separate 
question (the main question, but very hard to answer ( JTq] , f^j)). 

The coarsegrained probabilities {Pi} can be used to perform averages that ignore the details of 
the dynamics at all scales smaller than l^ n > (coarsegraining the smaller scales is required in order 
to define hydrodynamics). The only limitation, so far, is that 1/2 > > Z m i n . Bear in mind 
however, that before reaching the smallest scale Z m i n , as n is increased, we may not be able to 
distinguish clustering from voids without ambiguity. Even if clustering and voids are present at all 
scales they may not be scale invariant. The construction of efficient and even optimal partitions 
does not presume scale invariance, rather, the converse is true, especially in practice. 

We have used the frequency definition of probabilities because it arises naturally in both 
empirical data analysis and computer simulations. Using our simple example above, however, we 
can offer as an idealized staircase distribution the Cantor function (pq|, J33|) 



P(x) = . ... — 

y ' 2 2 2 



(8) 



where, because x — .t\t2 ■ ■ ■ ejv ■ ■ ■ is a ternary number with — or 2, P{x) is a binary number 
(because e^/2 = or 1). This staircase describes a mathematician's idealization of empirical 
data, namely, one (of infinitely-many) distribution that can be constructed by using points in the 
middle-thirds Cantor set: P(0) = 0, P(l) = 1, P(x) is constant on the open voids and increases 
discontinuously at the end point of any closed interval l^ n ' — 3~™ of the optimal covering, where 
the change in P(x) is AP — 2~ n . Therefore, the Cantor function defines a hierarchy of uniform 
coarsegrained distributions Pi = AP(x) — N~ — 2~ n , generation by generation in n, on the 
fractal support l^ n > = 3~™. The plot of the Cantor function, the mathematicians' idealization of 
an empirical distribution, is called a devil's staircase because it has 2°° steps. 
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A good way to look for voids is simply to plot the empirical staircase: P(x) is constant on 
the voids, where there are no data points. This is illustrated by the idealized example described 
above (see also figure |5|). Unfortunately this method is not trivially generalizable to more than 
one dimensions. 

The Cantor distribution reflects statistical independence based upon the two first generation 
probabilities p\ = pi = 1/2 for occupying the two first generation intervals, each of length = 
1/3, and is generated by the ternary tent map for a special class of initial conditions. For other 
initial conditions the map yields other distributions. Nonuniform distributions that lack statistical 
independence are trivially easy to construct via either the ternary Bernoulli shift or the ternary tent 
map. For example, iterate the (ternary) initial condition xq = .202002000200002 . . .. Statistically 
independent distributions where pi > p 2 can also be constructed with only a bit more effort (see 
ch. 9 of H| for the method). 

Summarizing, in the beginning there is only a collection of N points (or a time series) generated 
by some generally unknown dynamical system. We can construct the empirical distribution P{x) 
immediately, but we cannot construct the coarsegrained distributions {Pi} without first extracting 
the optimal partition {/;}. In dynamical systems theory the optimal partition is provided by the 
generating partition. The generating partition, if it exists, is the signature of the dynamical 
system because it shows how the dynamics coarsegrains phase space naturally. In contrast, the 
histograms that appear on the optimal support can be produced by every system in the same 
topologic universality class (symbolic dynamics is universal for all systems in the same universality 
class (p8[, IH)), so that a particular statistical distribution {Pi} cannot be the signature of a 
particular dynamical system. Both the Henon map and the logistic map f(x) = Dx(l — x), 
with D c < D < 4, where D c is the period doubling critical point, belong to the same topologic 
universality class (both the logistic map with D > 4 and the binary tent map belong to a separate 
universality class). Both systems, although of different spatial dimension, generate the same range 
of histograms (for corresponding classes of initial conditions), but on different supports. From the 
perspective of both dynamical systems theory and the search for scale invariance the central 
problem of data analysis is to extract the optimal partition of a particular set of data points. See 
ref. pq ] and [56| for examples of the extraction of optimal partitions from data in fluid mechanics. 

3.6 Multifractal scaling (via the empirical distribution P(x) on an effi- 
cient support) 

A nonuniform distribution on a fractal support looks fractally-fragmented (looks more and more 
spiky as the support is viewed with finer and finer resolution l^ n '). Distributions on space-filling 
supports may also be fractally fragmented, as we shall see below. A nonuniform distribution on 
a uniform or nonuniform support (that is either fractal or space-filling) can be used to sort and 
label fractal subsets of the support. Each subset has its own fractal dimension d(a), where a is the 
labeling-index |30| ]. Multifractal, in this paper, always means a spectrum of fractal dimensions, 
where each dimension describes the scaling via (|l]) or (||) of a subset of the support of P(x). We 
will show in parts ^ and |?] why the attempt to use other definitions leads to confusion and failed 
expectations (predictions of "dimensions" that are not the dimension of anything are discussed in 
parts H and ||). 

First, note that for a uniform distribution on a uniform support we can write Pi — N^ 1 = 

°. To describe a nonuniform distribution on a uniform or nonuniform support in generation 
n of coarsegraining, we try to define scaling exponents oti by writing Pi = Z"* , where the scaling 
index ctj takes on the same value a, = a on N(a) different intervals which we denote (using very 
sloppy but obvious notation) by h, . . . , ijv(a)- Therefore, by (g), we can write 

N(a) 

E ^ - 1 ( Q ) 

i=l 

to define d(a) as the Hausdorff dimension of the subset of the support where on = a (if, indeed, such 
scaling holds). If we could accurately replace the optimal nonuniform partition by the largest term 
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1(a) = max{/i : i = 1, . . . ,N(a)} in the subset, then we would obtain N(a) = l(a)~^ a \ where 
Dh > f(ot) > d(a). In other words, f(a) is the box-counting dimension for N(a) nonoverlapping 
uniform intervals of size 1(a). This replacement works only for relatively uniform sub-clustering. 
Otherwise it is necessary to compute d(a). 

Given a nonuniform empirical distribution P(x) over an optimal uniform partition, whether 
or not the histograms {Pi} scale over a reasonable range of different sizes of k, Pi = l a \ is the 
main question for empirical data analysis. In cosmology Pi = rii/N is the fraction (out of a total 
number N of galaxies) of galaxies in the ith interval l+. 

In all data analysis and computer simulations there can be at most finitely many values of a 
and f(a) (and finitely-many values of A and D(X) as well) because N n <C N is finite (N ks 400 
for typical galaxy samples). However, the number of points in a spectrum will grow generation 
by generation n for a multifractal spectrum (again, this distinguishes multifractal from bi-fractal, 
tri-fractal, . . ., TV-fractal) within the cutoff limits i max > > Z min . We can illustrate this 
via a simple example of an f(a) spectrum, the one given by the two-scale Cantor set with first 
generation probabilities p\ > pi describing statistical independence ( IpOfl ) in all higher generations, 
and optimal first generation intervals l\ = a" 1 and I2 = b^ 1 . In this case, fixing the scaling index 
a picks out a monofractal, so that d(a) = f(a) because all intervals in the subset have the same 
size l m = l™l2~ m ■ By using Stirling's approximation on n\ (requiring n 3> 1), we then obtain, 
with 

— xlnpi — (1 — x) lnp2 



2 In a + (1 — x) ln& 

that 



(10) 



-sins - (1 -s)ln(l -x) 
s In a + (1 — x) m 

where s = m/n — 0, 1/n, 2/n, . . . , 1 parameterizes both a and f(a). There are n + 1 points 
in the spectrum so that / ma x(a) < Dh, where aT Drl + b~ Da = 1 defines Dh- Note also that 
f(a) — s(x)/\(x), as expected. 

We can summarize our present terminology by writing down either the generating function 

(@) 

Xn(q) = J2 P i ( 12 ) 
i=l 

or the generation function (|54j) 

N n q 

r ™(9) = E# wl ( 13 ) 

i=l 1 

where, in the empirical search for scaling laws, it is first necessary to find an approximately 
optimal partition in order correctly to extract a multifractal spectrum of dimensions f(a), or even 
one fractal dimension Dh- Otherwise the convergence requirements (the number n of generations 
in a hierarchy {7^} of interval sizes) needed to see scaling with an approximately correct exponent 
almost always outruns the limitation placed by the number N of points in an empirical sample. 

When multifractal scaling can be shown to hold over enough different generations n of interval 
sizes l( n \ then we can also write 



X 



„(<?)« X>)r -/(a) (14) 



and 



Note that ( |l3| ) generalizes ( 2|) so that we can explicitly discuss nonuniform distributions of points 
on the support, as with (112). 

By using Pi = N^ 1 in (|13|) we get a result that looks formally like the generating function Z n (j3) 
in (Q) above if we set j3 = — t and N% = Z n ((3). Note, however, that no assumption about the 
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distribution of points {Pi} over the support {h} was necessary in order to define the generating 
function (Q). In dynamical systems theory the generating function Z n ((3) can be rewritten via 
symbolic dynamics as the partition function for a one dimensional Ising model (p4[) with long 
range interactions in equilibrium statistical mechanics (/3 is then the inverse temperature). 

Most discussions of multifractals inevitably stress that the generating functions ( |l2"| ) and ( |l3| ) 
should themselves scale approximately in the limit l^ n > — > where, due to domination of the entire 
sum by N(a) largest terms (|3Cj, ||3)), all of the same size (and parameterized by q), 

Xn{q)~l{a) qa(q) - S{a(q)) (16) 

or 

r„(g) « » 1 (17) 

where (because n goes to infinity as l^ — > 0) we would hypothetically obtain an /(a) curve 
parameterized by q. In this case, because there are enough points in the spectrum that f{a) may 
be differentiated accurately, the "generalized dimensions" D q can be defined by r(q) = (q — l)D„ = 
qa{q) — f(a(q)), where a = t' (q) is the slope in the plot of r vs. q and q = f'(a(q)) is the slope 
of the /(a) curve. This continuum limit is misleading because it is generally inapplicable to data 
analysis. 

To see this, merely note that both generating functions are sums over all possible scales, 

Xn {q) pa l( ax y a ^-K a ^ +... + |( 0fc )9°*(«)-/(a»( g )) (i 8 ) 

(or, in the case of a nonuniform distribution on a monofractal, over N n terms l ai with different 



exponents en). For finite l^ the generating functions ( Jl4| ) and (15) cannot scale approximately 
unless l( n ) is small enough, l^ n > <C^; 1 (we cannot emphasize this requirement too strongly), 
that a largest term approximation is accurate, which is generally not the case. Formulations and 
expectations of scaling based on the ?W ^ limits ( |l6| ) and ( |l7| ) have been taken seriously enough 
to have been employed during data analysis within the cosmology community (p7| ), as elsewhere. 
In data analysis this approximation is usually a very bad one (see Theilor J73[ for a clear and 
comprehensive exposition of the usual assumptions made in discussions of multifractal generating 
functions) . 

In typical data analyses found in the literature the largest term approximation is implicit in any 
plot of the logarithm of a generating function vs. ln/W i n the search for generalized dimensions 
D q . We expect that most empirical data will not produce small enough values of l^ for a largest 
term approximation to be applicable. Even if multifractal scaling should hold term by term in 
in the form of fildj), it cannot be discovered by a plot oflnxn vs. hil^ n h Instead, one must 
check for scaling term by term inside the sum (12). In other words, forget the sums ( fl^ j and (j7|) 



and check each term separately for scaling, to verify whether Pi — lf z with N(a) = ^ actually 
holds over a Geilo-range of scale sizes. The generating functions are not directly measurable 
anyway, so one needn't care whether or not they scale. 

The coarsegrained density defined by pi — P%/li — is typically singular. Even if the 

support is space-filling (even if Dh — 1) the coarsegrained density will look more and more 
intermittent as the resolution is improved if the distribution P(x) has nonuniform clusters that 
are scale invariant (see [ |56"[ for a one dimensional example from fluid turbulence). Any attempt 
to replace a staircase P(x) by a distribution with a differentiable density may delete, mask, or, at 
best, unnecessarily complicate the description of clustering and intermittence. Why introduce the 
mathematical fiction of a continuous distribution when observation gives us tractable discreteness 
directly? 

4 The correlation integral 

In cosmology ([^o), |]l3), H), as it was in empirical analyses of dynamical systems ten years ago 



(before the partitioning (|21 ) and recycling (|16|]) of strange sets), it is usual to work with the 
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Figure 6: A sketch of the sample, illustrating which galaxies we use (solid circle) and don't use 
(dashed circle) in the search for scale invariance. 



correlation integral 



1 



n(r) 



N 



(19) 



where "the correlation integrand' 




JV 



x i x j\) 



(20) 



is the fraction of galaxies in a sphere of radius r, centered on the iih of N galaxies. Here, we take 
< r < 1. This means that the original dimensional variable r for each galaxy has been rescaled 
by dividing it by r max i, where r max i is the value of the unsealed variable r for which the of radius 
r, centered on galaxy i just touches the boundary of the data set. In other words, spheres of 
every radius r lie completely within the boundaries of the data (see figure ^) . We stress that data 
sets should not be "extended" by adding points beyond the boundaries of the observational data 
during box-counting. To do so would change the data set from the one that we set out to analyze. 
In other words, we agree with the Pietronero school p3| , but in part ^|we will show how to refine 
the data analysis to eliminate a certain (self-) inconsistency in that work. 

Whenever the distribution in ( |20| ) exhibits statistical independence then n(r) = % n (2) holds 
as well, as is implicit in standard treatments. Clearly data that are statistically independent 
on an optimal partition (like the Cantor function differences Pj = 2~" over the closed intervals 
l(n) _ 3~«) w ju no ^ snow statistical independence over an arbitrary partition. In the limit of 
small length scales the generalized dimension D2 coincides with the correlation integral dimension 
v for the case of an empirical distribution P{x) that exhibits statistical independence on its 
optimal coarsegrained support ( pi| ), but we cannot merely assume statistical independence of 
observational data, and the zero length limit is anyway physically unattainable. Therefore we do 
not expect to extract D2 via data analysis. Loosely speaking, however, one can refer to v as the 
"correlation dimension" . Note that v < 3 requires a either a multifractal distribution on a fractal 
support, or, for a nearly uniform distribution, the support must be fractal. 

The generalization of ( |l9| ) is given by the (not directly measurable) generating function 



As in (|l9|), N is not the number of intervals in a nonoverlapping efficient partition, but is the 
number of points in the data sample. The correlation integral was first emphasized in the literature 




(21) 
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because it appears to allow us to circumvent the need to find an optimal partitioning. We will return 
to this point shortly. 

Our first main point is that whether one uses (fT^), (13) or ( pl| ) to study galaxy counts is 



irrelevant: a generating function cannot scale unless all terms in the sum scale separately, and 
only then if one term dominates. If scaling holds locally but the data set is not monofractal, then 
each term in the correlation integral ( |l9| ) must have the form 

m(r) pa r v ^ (22) 

where v(i) is the local correlation integral index, the scaling exponent (formally somewhat anal- 
ogous to cm in equation (^), except that here there is no accurate way to define a spectrum of 
fractal dimensions f(v) because an efficient partition is not defined by (|2l|)) for galaxy counts for 
n = 1,2,..., n max spheres with corresponding radii r centered on galaxy i. Clearly, in the ab- 
sence of largest term dominance the correlation integral representing local scaling with N different 
indices v(i), 

1 N 

is not scale invariant because each term in the sum scales differently: (Ar)"W = X^r"®. If, 
in a plot of log n(r) vs. logr, linearity is reported for a large enough range of values of r that 
the result is not spurious (see Figure [j], then the likelihood is that all terms inside the sum have 
approximately the same scaling exponent v(i) ks u, indicating that the data set is approximately 
monofractal. For example, with N points distributed uniformly over the N n = 2™ intervals of the 
optimal partition = of the middle thirds Cantor set, the correlation integral is dominated 
by a single term 

n(r) =n;(r) = 2 _n - 2 _JV (24) 

and scales when N 3> n, n(r) ps 2~™ = r D ° where Dq = In2/ln3 because r = 3~". In other words, 
n(r) is approximately scale invariant for N 3> n because every term rii(r) in the sum ([l9]) is the 
same and is also scale invariant. Here, Do = D% = v holds because we have (implicitly chosen to 
use) a uniform distribution on a monofractal. In particular the analysis of part ^ shows, that one 
cannot assume that ni(r) = r v + <5n 4 (r), where 5ni(r) is Poisson noise. 

In the analysis of empirical data, on the other hand, if scaling of ( ^l| ) is reported but has only 
been observed over a non-Geilo range of values of r then the resulting spectrum of generalized 
correlation dimensions v q in t(q) = (q — \)v q defined by 

1 N 

Gn(g) = ^5>( r )*" lR,rt(,) (25) 

4 = 1 

may be spurious (and appears only in the unphysical limits where N — > oo and r — > 0). The 
variation of t(q) with q obtained from a plot of hiG n (q) vs. lnr over an inadequate range of J — 
values probably indicates that the generating function (gTJ) does not scale. In [g] it is shown how 
one can even get a spurious "generalized dimension" spectrum from log-log plots of a Gaussian 
distribution. 

We point out next that the hope that one could circumvent the need to extract the optimal 
partition from the empirical data was an illusion: the generating function ( |2 1|) c annot be used 
to compute either a Hausdorff or box-counting dimension. Setting q — in (|25|), the standard 
approach would lead the expectation that isq in 



i N 

G n (0) = -^^(r)- 1 «r-^ (26) 



provides an estimate for the box counting dimension Dq. This is impossible, because neither the 
box counting nor information dimension is included in the v q spectrum (appeals to the limit of 



16 



vanishing r (pW) do not help in the empirical case). The reason is simple: the terms on the left 
hand side of (Eq) don't define an efficient, nonoverlapping partition of G„(0) intervals, each of 
size r. Hence, the "convergence" difficulties reported by |Q in the attempt to estimate the box 
counting dimension by computing Uq. 

If we would try instead to define an interval by formally writing r Va (n^r) -1 ) = rf in (]26|), 
then the result 

^X>^1 (27) 

reminds us superficially of equation (^) above, but d does not define a Hausdorff dimension: the 
N intervals overlap very badly with each other because the sum is over all N galaxies instead 
of over an efficient nonoverlapping partition. Equation (^?j) was proposed by Martinez (fH) as 
one that yields Dh, as well as the D q spectrum for q < 1 via the generalization (see also E3) 

1 N 

W n(t)=^Y. r * tK P~ q ( 28 ) 
i=l 

but information about the spectrum D q , aside from an estimate of D2, is not included in these 
formulae. 

Equation ( p7j) is supposed to be based on the equation 

1 N 

^E r ^^ ( 29 ) 



N 

i=l 



with 



rW w KN- 1/D (30) 

discussed by where rj is the nearest neighbor distance between two points in a sample con- 
sisting of TV points. However, (^9|) is not the same as (^7|) because the partitioning in ( |29| ) is 
nonoverlapping: each point is connected only to one other point. For any finite subset of the 
middle-thirds Cantor set consisting only of end points, we find that K^ 1 — 2 because the number 
of end points N = 2 n+1 is simply twice the number of intervals N n = 2™ required to cover those 
points. In this case the required nearest neighbor intervals are simply the usual nonoverlapping 
intervals r-j = 3~™ of the middle-thirds Cantor set. Notice also that one cannot stray far from 
the optimal result — 3~" by using nonterminating ternary expansions x = .t\ . . . . . . with 

e-i = or 2 to generate N points of the middle-thirds Cantor set rather than by using only termi- 
nating ternary strings. While scaling holds exactly when we use nearest neighbor distances in (|2^ ) 
and ( |30| ) for the mathematical idealization of a Cantor set, we should not expect an analogous 
"convergence rate" when we use empirical data. Equations ( p9| ) and ( |30| ) should only be expected 
to yield an accurate scaling index D w Dh when the nearest neighbor distances are taken to be 
the end points of an optimal partition. 

The expectation (Q, that different generating functions can be used to compute "the 
D q spectrum" for different ranges of q, even in the idealized limit where l^ — > 0, is based 
on three unfulfilled expectations. First, the box counting dimension does not belong to the v q 
spectrum (neither does the information dimension). Second, the v q spectrum (defined by j25| ) in 
the limit of infinitely small r) does not coincide with the D q spectrum of (|IJ) and (|l|) (which 
do include the box counting and information dimensions as l^ n > — > 0). Third, one cannot change 
probabilities and supports and expect scaling exponents to remain invariant: neither multifractal 
spectra /(a), nor generalized dimensions D q derivable from multifractal spectra, can be used to 
define universality classes. The misconception that an optimal partition is unnecessary has led to 
the expectation that partitions can be manipulated without changing f{a) and D q . Multifractal 
spectra and generalized dimensions are nonuniversal: they change whenever either the support or 
the histograms on that support is changed. This is easily seen via the simplest possible examples. 

To emphasize this last assertion we demonstrate what happens when we try to get a complete 
D q spectrum by combining results from ([[2]) and ([l3]) for disjoint ranges of q, but while using 
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different distributions on different supports in each generating function. The underlying point set 
is in each case taken to be a finite number of points in the two-scale Cantor set. We consider 
first a uniform partition with uneven probabilities (pi > p2 in generation n = 1) and statistical 
independence (for ease of calculation), whereas in the second case we take even probabilities 
Pi = Vi — 1/2 also with statistical independence on the uneven but optimal partition with Zi > Z2- 
We can cover the finite (or infinite) point set in the first case by using an efficient uniform (but 
not optimal) partition based on l\, so that then we get D^, = Inpi/lnZi and -D-00 = lnj^/hiZi. 
If in the second case, we use ( p"3| ) with Pi = N^ 1 = 2~ n , then ~ — ln2/ln/i and = 
— In 2/ In I2 ■ The separate spectra do not lie on top of each other because the end points Doo 
and Z?_oo do not coincide. Clearly, we cannot combine these two different calculations in order to 
estimate disjoint parts of the D q spectrum of either case. 



5 Are galaxy distributions scale invariant? 

In this search for scaling we use only the correlation "integrand" rii (r) , not the correlation integral 



n(r) for the reasons explained in parts 3.6 and |4|. We calculate the number of galaxies within 
a sphere of radius r centered on each galaxy as depicted in Figure ^. We only use spheres that 
are completely within the sample geometry. Since we want to investigate the scaling properties we 
are not allowed to apply any boundary corrections that assume stationarity of the distribution of 
galaxies with respect to translations (i.e. homogeneity) or rotations (i.e. isotropy). Such "correc- 
tions" would tend to introduce a spurious scaling with dimension three. Boundary "corrections" 
are inherent in all the usually used estimators for the two-point correlation function, see e.g. [ |iio| . 
Similar corrections were used for estimators of the correlation integral see e.g. Q]. For the same 
reasons as above we use volume limited samples. Using flux (or magnitude) limited samples one is 
usually using a weighting scheme based on the selection function. In weighting with the selection 
function one assumes homogeneity in giving a weight to galaxies proportional to the mean density, 
which is determined mainly from the nearby regions. 

We only show galaxies in the plots where we have at least one neighbour in a radius range 
larger than A r , our "scaling" range (with the limited data available it does not make sense to use 
decades, since we only have at maximum one decade available). To perform the scaling analysis 
more quantitatively we fit a straight line to the log(rii)-log(r) plot and determine the slope, again 
only for galaxies having a "scaling" range larger than A r . 

We are severely limited by the small number density (or equivalently, the small sample size) 
of the catalogues. Therefore we will not be able to extend A r over more than one decade which is 
obviously too small to draw any conclusions about scaling (see Figure [[]). To estimate the influence 
of A r on the distribution of the slopes we look at three different A r . To get a Geilo-range of scales 
from pie-shaped samples (as in figure ^) we would need observational data extending over several 
thousands h~ 1 Mpc. 



5.1 The CfA I galaxy catalogue 

We look, as an example of optically selected data, at the CfAI catalogue a magnitude limited 
catalogue consisting of 1880 galaxies (Huchra Using this data set fractal and multifractal 

analysis were performed e.g. by Coleman & Pietronero [[[3) or by Martinez et al. E] . 

First consider the volume limited sample with 40 ft, _1 Mpc depth with 360 Galaxies in total, 
having a mean separation of 9.5 /i _1 Mpc. In Figure [7] we show for 35 arbitrarily selected galaxies 
the number of neighbours ni (r) against the radius of the sphere r for galaxies in the volume limited 
sample with 40 /i _1 Mpcdepth. In this case we demand a "scaling" range of A r > 3.1/i _1 Mpc, 
resulting in 157 galaxies having a (more or less well) defined slope. In Figure || we show the 
scaling properties for 35 arbitrarily selected galaxies from the 67 galaxies with "scaling" range 
A r > 6.2/i~ 1 Mpc, and in Figure ^ we show the scaling properties of all the 22 galaxies which 
have a "scaling" range A r > 9.3/i~ 1 Mpc, spanning roughly one decade. As a first impression one 
recognizes that the scatter in the slope is large and does not decrease for larger A r , which should 
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Figure 7: Plots of nj(r) against r (logarithmic axes) for volume limited sample with 40 /i _1 Mpc 
depth (solid line) and "scaling" range A r > 3.1ft. _1 Mpc. The dotted line is the fit, the long dashed 
line is for n^r) oc r 2 and the short dashed line for n^(r) oc r 3 . 



19 




20 




21 



Figure 10: The frequency of the slopes for the volume limited sample with 40 h 1 Mpc depth, 
for the sample with A r > 3.1/i -1 Mpc (stars), with A r > 6.2ft, _1 Mpc (open squares), and with 
A,. > 9.3/i _1 Mpc (crosses). 




Figure 11: The frequency of the slopes for the volume limited sample with 60 h~ 1 Mpc depth, 
for the sample with A r > 4.7/i _1 Mpc (stars), with A r > 9.4/i _1 Mpc (open squares), and with 
A r > 14.1/i~ 1 Mpc (crosses). 
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result in more reliable estimates for the slope. To make this impression more quantitative we plot 



the frequency of the slope for each of these samples in Figure 10, One has to bear in mind that 
this frequency table is in the case of A r > 9.3/i _1 Mpc constructed from 22 galaxies only. In all 
three cases, the slope fluctuates over a range of 1.25 to 2.5. 

In Figure [ll] we plot the frequencies of the slopes for the volume limited sample with 60 h~ 1 Mpc, 
consisting of 215 galaxies with mean seperation 24 ft Mpc. Imposing the constraints for the 
"scaling" range we are left with 67 galaxies for A r > 4.7/i _1 Mpc, 31 galaxies for A r > 9.4/i _1 Mpc, 
and only 9 galaxies for A. r > 14.1/i _1 Mpc. We see that no conclusions are possible with so few 
points. The peak around 1.75 for A r > 14.1/i _1 Mpc (determined from only 9 galaxies!) is a 
mainly due to sampling galaxies from the same region of the space in the center of the sample (see 
Figure ||) . 

5.2 The IRAS 1.2 Jy galaxy catalogue 

Now we look at a sample of IRAS selected galaxies with limiting flux 1.2 Jy Fisher et al. |2^] 
consisting of 5313 galaxies. The big advantages of this sample is the nearly complete covering of 
the sky, and the homogeneous flux calibration. Fractal and multifractal analysis of IRAS galaxies 
were performed e.g. by Martinez & Coles Xia et al. Q and Labini et al. J72|. The last 
mentioned authors claim, that the IRAS samples are too sparse to estimate fractal dimensions 
reliably. 



We proceed similar to the analysis of the Cf A catalogue |5.l[ As discussed in Kerscher et al. |4lJ 
we find differences between the northern and southern hemisphere, but since we do not want to 
focus on this topic we show the results for the combined data only. 

In Figure |l2| we show for 35 randomly selected galaxies the number of neighbours rii (r) against 
the radius of the sphere r for galaxies in the volume limited sample with 80 /i _1 Mpc depth 
(mean seperation of the galaxies is 24 /i _1 Mpc). Restricting ourselves to a "scaling" range of 
A r > 6.3ft~ 1 Mpc we are left with 359 galaxies of the total 788 galaxies in the volume limited 
sample. In Figure [l^ we show the scaling properties for 35 randomly selected galaxies from the 
167 galaxies with a "scaling" range of A r > 12.6/i~ 1 Mpc, and in Figure [l4] 35 randomly selected 
galaxies from the 72 galaxies with a "scaling" range of A r > 18.9/i _1 Mpc. Again we have a 
"scaling" range only spanning roughly one decade. 

The sample with A r > 6.3/i~ 1 Mpc is clearly inapropriate for an analysis, since often only one 
neighbouring galaxy is within the scaling range, giving rise to a spurious slope of zero. Again we 
see a large scatter in the slopes for all A r , which does not decrease if we got to higher A r . 

To make this more quantitative we again plot histograms of the slopes, now for a sequence of 
volume limited samples. The sample with 40 h~ 1 Mpc depth containing 646 galaxies is shown in 
Figure [l5|. The sample with 60 h~~ 1 Mpc depth containing 880 galaxies is shown in Figure [L6|. The 
sample with 80 h~ 1 Mpc depth with 788 galaxies is shown in Figure [IT]. 

5.3 Discussion of the results 

In both catalogues the local scaling exponents v(i) fluctuate over a broad range (see Figures |l5| - 
|T7j and Figures [lO], [ll]) indicating that there is no global monofractal scaling. It mainly tells 
us that the distribution admits large fluctuations. From the limited data we are not able to 
judge whether these fluctuations are scale invariant (over three decades) or not. Therefore, fractal 
scaling, and certainly multifractal scaling, cannot be deduced from that limited data as claimed in 
e.g. j4^], ^0|, |72j. The fact that the correlation integral (or the two point correlation 

function) apparently scales with one exponent is in this case not related to the scaling of the 
galaxy distribution (again, see Figure Q). We want to emphasize that the broad range of different 
slopes is not a sign of multifractality, it only shows that we have large fluctuations. In e.g. |62] ] 
the authors claim to see scale invariance over three decades (from 1 /i~ 1 Mpc to 1000 /i _1 Mpc). 
This statement is based on the scaling properties of only one Ui{r). In this case ni{r) should 
be the number of galaxies in a sphere centered on our galaxy. However, the analysis is carried 
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Figure 12: Plots of rij(r) against r (logarithmic axes) for volume limited sample with 80 7i _1 Mpc 
depth (solid line) and "scaling" range A r > 6.3ft. _1 Mpc. The dotted line is the fit, the long dashed 
line is for Uj(r) oc r 2 and the short dashed line for Uj(r) oc r 3 . 
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Figure 13: Same as Figure O but with "scaling" range A r > 12.6/i 1 Mpc. 
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Figure 14: Same as Figure |l2| but with "scaling" range A r > 18.9h 1 Mpc. 
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Figure 15: The frequency of the slopes for the volume limited IRAS sample with 40 ft, _1 Mpc 
depth, for the sample with A r > 3.1/i _1 Mpc (stars), with A r > 6.2/i~ 1 Mpc (open squares), and 
with A r > 9.3/i _1 Mpc (crosses). 
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Figure 16: The frequency of the slopes for the volume limited IRAS sample with 60 ft. _1 Mpc 
depth, for the sample with A r > 4.7ft~ 1 Mpc (stars), with A r > 9.4/i _1 Mpc (open squares), and 
with A r > 14.1/i _1 Mpc (crosses). 
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Figure 17: The frequency of the slopes for the volume limited IRAS sample with 80 h 1 Mpc 
depth, for the sample with A r > 6.3/i _1 Mpc (stars), with A r > 12.6/i _1 Mpc (open squares), and 
with A r > 18.9/i _1 Mpc (crosses). 



out by using only intersections of a cone with a sphere as discussed in ]72j . Such an analysis is 
inconsistent with their own definition of fair sampling, very badly violating Fig. 8 of |^3[ . 

In fitting straight lines to a log-log plot over roughly one decade we have performed a su- 
perficial data analysis, one that gives excessive weight to galaxies near the center of the sample. 
Unfortunately nothing more is possible without postulating "boundary corrections" that have no 
basis in observation, or without even more unfairly weighting points near the center of a conical 
sample more heavily than those near a boundary. But even in doing so, we do not find any indi- 
cation of global scaling. The main message is that the current data are insufficient for a reliable 
scaling analysis. We end this section with a quote from ". . . if a sample contains too few 

points there may be no way to get any information from it. In such a case one has to wait for 
better (observational) data." 



6 Generating functions 

The term multifractal is defined in Jones J3(| by requiring only that the moments of an arbitrary 
distribution P(X, I), 

(Xi- 1 ) =Y J P(XJ)X q ~ 1 , (31) 
x 

where X is a random variable defined in or on intervals of size I, should scale, 

{Xi- 1 ) kI^", (32) 

for small enough interval sizes I. However, without further requirements on X and P(X, I), there 
is no reason to expect that scaling exponents in (|3^), if they exist at all for a given distribution 
P(X,l), bear any relation to generalized dimensions D q derivable in the infinite precision limit 
from multifractal spectra D(X) or f(a). To be specific, is the partition nonoverlapping? Efficient? 
What is fractal about X or P(X, I)? We will show via example in part |?j that equation (|3|) with 
a scaling law (^) generally does not describe intermittence due to voids, and that if one tries in 
those cases to force the definition ( p — (p—l)D p (or, as in Q , £p = (1 —p)D p ) then the D p are not 
dimensions describing the support of the probability distribution, or of any other coarsegrained 
set or subset connected with an arbitrary distribution P{X, I). 
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The generating function (31) combined with the scaling expectation ( |32| ) is used in definitions 
of multiaffine fractals (j5j), where a deterministic or random variable (or field) X is continuous but 
has singular (or no) derivatives. If the distribution of singularities of the field X can be described 
locally by writing X ss l h and P(X, I) w l~f( h \ then ( |3l| ) may or may not yield scaling exponents 
Qp that give rise to a spectrum of generalized dimensions, defined by (p — l)D p — ph(p) — f{h(p)) in 
the limit I — * 0, where Dq, D\ and D2 really are fractal dimensions of something in the model. This 
happens only when f(h) describes a spectrum of fractal dimensions. Otherwise h and f(h) are 
just a rewriting of a nonfractal probability distribution P(X, I) via a coordinate transformation, 
and nonfractal distributions cannot be made fractal (or the converse) by a differentiable coordinate 
transformation. Stated another way, f(h) is not a spectrum of Hausdorff dimensions unless "2" 
represents an optimal or at least efficient partition of the support of an underlying pointwise 
distribution P(x). Examples in the literature where X w l h with P(X,l) ~ l~fw are used 
without any requirement of efficient partitioning of a support are height fluctuations in surface 
roughening (||), self-organized criticality and velocity structure functions in the inertial 

range of fluid turbulence ( p4| ). In contrast with multiaffine fractals (where there is no idea of a 
generating partition) , for a self-similar fractal (heretofore called "fractal" ) the point set is generally 
spatially-fragmented (Koch and Peano curves are, however, continuous), like a Cantor set, and Pi 
scales like l^ (not like l i ■^ Q *- ) ) in order to describe a highly singular density pi — Pi/U — 2" i_1 
on the optimal partition describing the support of P(x). 

In a spirit similar to the attempt to define multifractal by using the moments (pl|) of an 



arbitrary distribution p(X, I) with a scaling law (|32|), the definition 

(P^)=J2 P (P,l)P q -\ (33) 
p 

where the probability distribution p(P,l) is undefined, is treated in various places (|l^]) as if it 
would be identical with the generating function 

N n 

\_nifl) (P^ ) coarsegrained ^ PjPj j (34) 

z=l 

although it is not. 

For an arbitrary probability distribution p(P, I) these two generating functions are not even 
related; their scaling exponents (if scaling exponents exist in either case) are not necessarily the 
same even if the generating functions are qualitatively related. Multifractal spectra and general- 
ized dimensions are not universal. Instead, they change with the histograms and their support. 
Furthermore, any deviation from the empirical distribution P{x) and its optimal coarsegrained 
descriptions {Pi} is equivalent to changing the underlying data set. Rather than imagining that 
the empirical distribution P(x) can be treated as a random field that fluctuates from one galaxy 
sample to the other (as in Jones |35|), we view different P(x)'s from different samples as disjoint 
pieces of a single global distribution of galaxies whose local properties can be discovered empiri- 
cally, but whose entire global (one hesitates to say "universal" ) aspect can never be known due to 
the inherent limitations on observation. We do not want to try to replace what we do not know 
(P(x) measured globally) with speculations that cannot be tested (p(P,l) postulated globally). 
The notion of statistical ensembles is useless here: There is only one universe, and it is not in 
equilibrium. 

The source of confusing together entirely different generating functions can be traced to the 
use of the infinite precision limit in papers on dynamical systems theory written more than ten 
years ago. It is suggested in |3l| (see also [^7|) that ([m|) is analogous to the Lebesgue integral 

(P(AOr)) 9 - 1 ) = / dP(x)P(B l (x)y- 1 (35) 



and should yield the same generalized dimensions D q in the infinite precision limit, where P(x) 
is supposed to be "the natural invariant measure" of a chaotic dynamical system on a strange 
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attractor (see ^9j, e.g., for one definition of "natural") and P(Bi(x)) is the fraction of points 
lying within a ball of size I covering (but not necessarily centered on) a data point x. There are 
two serious difficulties with the attempt to replace ([34]) by ([35]), and both revolve about lack of 
uniqueness. 

First, there is no evidence from observation that a chaotic dynamical system generates "a 
natural measure" for the various initial conditions (meaning also "present conditions") found in 
nature. Mathematically seen, a chaotic dynamical system can generate infinitely many different 
distributions ("measures") P{x) for infinitely many different classes of initial conditions ( p3[ & 
Q). Empirically, the dependence on initial conditions is not a problem: the data are described 
by the empirical staircase 

1 N 

i=l 

Without having made any theoretical assumptions that prejudice the data analysis we can say 
that the initial conditions, whatever they were, produced the empirical distribution P(x) via the 
time evolution of some dynamical system. 

Given the empirical measure ( |36| ) there is still ambiguity inherent in the attempt to use ( |35| ) 
as a replacement for (|34|). In finite precision there are different possible definitions of the integral, 
depending on which subset of the data set we decide to measure (before we can identify the 
function P(Bi(x)) we must first define "2"). 

If we choose the balls/intervals Bi(x) to have arbitrary length I, centered on a data point Xi (as 
in |p9[), then the fraction of points lying within each interval of size I is given by P(Bi(x)) = n(x, I) 
where n(xi, I) = rij(Z) is the correlation integrand 



i N 

n ^ = N ? 0(M*i-*jl)- (37) 

Deleting the term j = i in (|37| ) is unimportant if the intervals are large enough to give "good 
statistics" (pedantically, one can also replace the factor l/N by 1/(N — I) in (Ej^J)). Insertion of 
the pointwise definitions P{Bi{x)) = n(x,l) and 

dP(z) = ^E ( 38 ) 
»=i 

into the integral ([35]) yields the correlation integral generating function 

N 

/ dP(x)P(B l (x))"- 1 = -J2 riiil)"- 1 = G n (q), (39) 

which differs significantly from (|34|) in data analysis, as we have emphasized in section 

In dynamical systems theory the N intervals can in principle be chosen small enough not 
overlap with each other: on a mathematically-defined strange attractor there are t°° points in any 
neighborhood of any arbitrary point Xi on the attractor (t°° is the cardinality of the attractor). 
Here, the N intervals (or balls) Bi of size I can be chosen small enough not to overlap, but certainly 
do not partition the attractor efficiently, if at all. Equation (p39|), which was not invented with 
partitioning in mind, is merely a time average over N points on the attractor, and the uniform 
weight l/N is correct because each point Xi occurs exactly once (so long as trajectories of the 
dynamical system are unique, which we assume here). In nonlinear dynamics calculations the 
number N of points may be increased by increasing the precision of the calculation. In cosmology, 
in contrast, N is the total number of galaxies in a finite sample, so that the N intervals of size I 
are always overlapping. 

There is a different way to define balls Bi(x) and a corresponding function P(Bi(x)). Instead 
of choosing N uniform intervals where N is the number of data points, requiring the pointwise 
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definition P(Bi(x)) — n(x, I) as given by ( |37| ) (which does not include a partition of the data set), 
we can instead choose our balls Bi to be the N n intervals {h} in the optimal partition of the data. 
Then, P(Bi(x)) is given by the simple function 



P{Bi(x)) =J2PiXh(x) (40) 

i=l 

where Xh( x ) 1S the characteristic function for the partition {^} of disjoint intervals |36| and 

i+Jli — 1 

P i = P(x i + k)-P{x i )= J2 6(x i+ni - Xj ) = m/N (41) 

i=i 

Here, Xi and Xi+ ni = x, ; + Z 4 - are taken to be the end points of any of the N n optimal intervals {It}. 
With this optimal choice of "what to measure" (optimal choice of function P(Bi(x)) to integrate 
with respect to the measure P(x)) the integral ( |35| ) yields 

N 

I dP(x)P{B l {x)f- 1 = J2 W = Xn(q), (42) 

i—l 

From the standpoint of both data analysis and measure theory the only significant difference be- 
tween the distributions ^F\) and is the lack of a partition in and the use of an optimal 
partition to define (j^if). Whether these two approaches do or do not, in the limit of — » for 
a mathematical fractal of cardinality t°° , yield the same generalized dimensions ( whether D q = v q 
as l^ — > 0) is of no importance whatsoever for the analysis of empirical data. 



7 Lognormal distribution 

What has lognormal to do with multifractal? The question arises because it has been asserted 
that the lognormal distribution is multifractal, that it defines a spectrum of generalized dimensions 
(|57) & H^]) and an f(a) spectrum (0). Before answering this question we review how and 
where the lognormal distribution appears in discussions of multifractals, where a multifractal 
spectrum (as defined in this paper, following Halsey p5] | ) describes the spectrum of dimensions of 
nonoverlapping subsets of the support of a probability distribution. 



With the discussion of |37|] in mind, but following 1 30 , let us make a largest term approx- 
imation on the generating function (|T^). With q fixed we first locate the largest term in ( |l2| ) 
by minimizing the exponent (qa — /(a)), yielding q = f'(a(q)). Very near (and only very 
near) to the smallest exponent r(q) = (qa(q) — f(a(q)), where a{q) = T'(q), we can write 
f(a) w r(q) + (a — a(q)) 2 f" (a(q))/2. According to a standard method (0) we next replace 
the sum over all these nearby terms by the integral 

, s / % f / , A (a-a(q)) 2 f"(a(q)) 

X n(q) ~l [n)T(q) J s dap(a)(l^y (43) 

where the range of integration 5a is over the tiny region 5a in a containing all of the exponents 
(qa — f(a)) that do not deviate from the minimum exponent r{q) more than quadratically in 
(a — a(q)). This quadratic approximation to deviations of the exponent (qa — f(a)) from the 
minimum r(q) only works as K n ) — ► 0, in which case the integrand in ( |43j) is sharply enough peaked 
that, with small error, we may extend the integration limits to plus and minus infinity. Clearly, a 
locally (not globally) Gaussian approximation to deviations from the minimum exponent r(q) is 
the same as saying that the deviations of (N(a) (?(")) 9 a) from r (q) are locally (not globally) 
lognormal. This local lognormality only contributes to the pref actor in (^3|) in the unphysical limit 
where l^ — > 0, and not to the f(a) spectrum described by the exponent r(q). Any time that an 
exponent h has a Gaussian distribution then the function p(h) — l h is distributed lognormally (see 
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p2| for an example from percolation theory, where permeabilities k of sandstone and limestone 
deposits were long thought to be approximately lognormally distributed, with Gaussian porosities 
(j), where c « I'). This has nothing to do with the question whether the lognormal distribution is 
multifractal. 

There are two ways, related mathematically to the above approximation, in which the lognormal 
distribution is called multifractal in |36| and p7| |. Jones [[36| asserts that ( |3l| ) with ( |32| ) defines 
multifractal, where C p = — (p — l)D p and the D p are supposed to be generalized dimensions 
derivable from the Halsey method as well. This constitutes an entanglement of unrelated ideas. 
Following Jones we use X = l h in (^Tj) along with a lognormal distribution of X to compute 
(l ph ) (the exponent h is Gaussian with mean (h) and mean square fluctuation a 2 = ((h — (h)) 2 }). 
Using Jone's |5(| second definition of ( p (instead of his first), where ( p = ln((X p )) / (X) p ) , we obtain 
C p = exp(2er 2 (lnZ)p(p — 1)). In other words a scaling law ( p2|) generally does not follow: lognormal 
distributions per se, inserted into (|3l| ) do not yield scale invariance (|3|), because the expected 
scaling exponent depends on In I. A scaling law (|32| ) follows only if we restrict to lognormal 
distributions where a 2 is proportional to — 1/lnZ. In this case we obtain D p = — (h)p, where 
D2 < Di < Dq = 0. Dq = is not the dimension of the support of the lognormal distribution 
(where Dh — 1), and the scaling exponents D p are not the dimensions of anything else in that 
distribution. (We adhere to the assumption that fractals and multifractals are generated by 
deterministic dynamics, and do not consider the so-called "random fractals".) Jone's refinement 
of (|3^) is, in this case, equivalent to the imposition of the constraint £1 = in turbulence modelling 
(see Frisch @). 

To try to model an eddy cascade in fluid turbulence one must evaluate the average (l ph ) 
where I represents the size of an nth generation eddy, and h is supposed to be an exponent 
analogous to a in multifractal spectra. There, Frisch [Q makes a different identification than 
Jones, namely, that (p — l)D p = ( p + 3(p — I) for a lognormal distribution. This yields Dq = 3 
(space-filling support) but the exponents D p for p ^ are not dimensions of anything in the 
model. The origin of this apparent (from our standpoint) mislabeling of scaling exponents as 
multifractal is that Frisch defines f{a) spectra (and consequently D p spectra) differently than we 
have. His definition is not designed to agree with the fractal dimensions f(a) describing sing ular 
distributions ala Halsey ct al. |3(J, but describes instead the Cramer function in statistics (J43J). A 
Cramer function may exist where nothing is fractal. A Cramer function, by construction, describes 
distributions of independent random variables hi or cti the limit where n goes to infinity (I goes 
to zero). In contrast, the indices a in ([I|) and © are not random variables: they are scaling 
indices describing coarsegrained probabilities (and occupation numbers ni — NPi) Pi = i"*. 

The Cramer function is a systematic way of obtaining a description whereby X k, l h and 
P(X, I) « l~fw via a limit theorem in classical statistics, for the case where the hi are inde- 
pendent random variables, and has no necessary connection with the idea of spectra of Haus- 
dorff dimensions, or generalized dimensions derivable from spectra of Hausdorff dimensions. The 
Cramer function is based on the law of large numbers and appears in classical equilibrium sta- 
tistical mechanics, for example. This approach can be used to describe an alternative version of 
lognormality discussed in M: in that case their spectrum f(a) « D — (a — ao) 2 /4(ao — D ) 
describes the integrand of (j3l|), and not a scaling law (32) that might follow as a consequence of 
actually calculating the integral (|3~L|). A scaling law (|32|) does not follow at all without assuming 
arbitrarily that a 2 varies as — 1/lnZ. With that restriction we obtain (p — l)D p = Q p — (p — 1)Dq 
with ( p = —aop, analogous to Frisch's result quoted above, but where we should now choose 
Dq = 1 in order to describe the support of the lognormal distribution (otherwise, Dq is not the 
fractal dimension of anything in the model). Even with this choice the remaining D p are not 
fractal dimensions, even for p = 1,2, of any aspect of the lognormal distribution. However, the 
same lognormal distribution can be understood from an entirely different standpoint: namely, as a 
combination of the Gaussian integrand with the exponent t(0) = —Dq in equation ( p3| ) of Halsey 
et al above. In this case Dq is is not the Hausdorff dimension of the support of the lognormal 
distribution, and the linear spectrum D p describes only the region near the peak of an unknown 
/(a) spectrum, a spectrum of box-counting dimensions where / max = Dq (from the standpoint of 
eqn. (^) above one would interpret the ( p spectrum derived from the lognormal distribution in 
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turbulence modelling as the fragment of an f(a) spectrum that is valid only for very small values 
of p, near p = 0, although this is certainly not the traditional interpretation). On the other hand, 
however, the generalization of the lognormal approximation represented by Jone's ]35| ] equations 
(14) and (15), and (30), are not definitions of f(a) ala Halsey et al. [Q, but represent instead 
the idea of a Cramer function in classical statistics (see Frisch '95): unless f(a) arises as the 
spectrum of Hausdorff dimensions from an infimum condition on partitions, then f(a) is not (by 
the Halsey pY^ / definition) a multifractal spectrum. This entanglement of different ideas did not 
originate in cosmology: one of the authors of Halsey et al. |3^] later used the Cramer function, 
called it "multifractal" , and cited Halsey et al. J3(J as the reference ( |I(J ) . 

One can choose to follow Halsey et al. |30) in defining f(a) and D q , or one can follow Mandelbrot 
p3| in defining "/(a) and D q " via Cramer functions but one should not mix these two 
different definitions together without comment as is done in |36| . We recommend the definition of 
multifractal given in this paper as the standard because, in that case, f(a) is always the Hausdorff 
dimension of a subset of the support of the empirical distribution P(x) . The necessity of an optimal 
partition in order to define f(a) is implicit in Halsey et al. |30|| (see their "infimum" requirement) 
but was not emphasized strongly enough at that time. The role played by the infimum requirement 
became clear only after the later work on generating partitions in nonlinear dynamics ( |2lf| , |l6[| 
and p4|| ), which is little-known within the community of cosmologists. 

When is a distribution P(x) multifractal? To answer this question consider any distribution 
P(x), empirical or theoretical, where P(0) — 1/N, P(l) — 1, and P(x) is nondecreasing. For 
idealized differentiable distributions we have N — 2°° (which is the same as 10°°, etc.) and 
P(0) = 0. P(x) need not be differentiable, however, and generally isn't. In order to determine 
whether P(x) "is multifractal" (admits a decomposition of its support into interwoven fractals with 
different dimensions /(a)) one must determine the optimal partition and form the difference AP(x) 
over each interval in that support to obtain the hierarchy of histograms {Pi} (for a differentiable 
distribution any space-filling partition will do the job). Having done that, one then investigates 
whether Pi « I? 4 holds over N(a) « /-/(«) intervals in the support as the interval sizes are 
systematically reduced. All fractal distributions (for points on a line) have an density pi ~ 
that is dense with singularities because oti < 1, and if the distribution is multifractal then the 
indices on will vary over the support according to N(a) ~ l~f( a K A highly fragmented (spiky) 
coarsegrained density is typical of a multifractal distribution P(x). 

Gaussian distributions are not multifractal. Neither are lognormal distributions. No smooth, 
differentiable distribution is multifractal because, by definition, such a distribution has a smooth 
density on a support with integer dimension D a . Smooth distributions can be differentiated 
everywhere, corresponding to the requirement that f(a) — a — Dq holds everywhere. The Cantor 
function P(x) in part 3.1: describes clustering and voids), the binomial distribution with p\ ^ pi 
on a space-filling support describes intermittence without voids (see |5^| for a physical example) , 
but the lognormal distribution cannot describe voids because it is differentiable. The distribution 
defined by the density p(x) — dP(x)/dx — (x(l — x)) is not differentiable at x ~ and 1 and 
is bifractal (Halsey et al. |§): a = f{a) = 1 for < x < 1 but a = 1/2, /(a) = for x = 
and 1. The bifractal staircase of She et al. |6|] shows coalescence without voids only because a 
continuum of initial conditions was used rather than a finite number (blocks of initial conditions 
with voids should also produce coalescence with voids in that model). 

Summarizing, nontrivial f(a) spectra guarantee an intermittent probability density, one cor- 
responding to a nondifferentiable probability distribution P(x). All fractal distributions have sin- 
gular densities. Having a fractal support guarantees a singular density p w l"^ 1 with < a < 1, 
but multifractal scaling can also hold for inhomogeneous distributions (like those having statistical 
independence with uneven probabilities) on a space-filling support. 



8 Homogeneity, coarsegraining and hydrodynamics 

The singular "pointwise" density of matter p(x, y, z) in any epoch is determined by the empirical 
staircase distribution P(x,y,z), the generalization of (Q) to three dimensions. A necessary but 
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insufficient condition for a coarsegrained density that is smooth enough to be approximable by a 
differentiable function is that the support of the empirical distribution is space-filling, meaning 
D H = 3. 

No information about Dh is provided by the correlation integral exponent v (or by the cor- 
relation dimension D2) unless, (1) the support is monofractal and the distribution is uniform 
(a = f(a) = Dh < 3), or (2) the support is space-filling and the distribution is uniform or at 
least differentiable (a — f(a) — Dh — 3). In both cases v = D2 = Dh- Otherwise, we know only 
that v < Dh and D2 < Dh- An increase in v with an increase in scale from intermediate toward 
cosmologic scales, ^> 1000/i _1 Mpc, even if it would be found in the data, would not tell us 
anything about Dh unless we would find that v = 3. So long as v < 3 no information about Dh 
is provided by v. An increase of v with increasing scale (as is claimed in | f30| , ||) would not imply 
that there exists a large scale coarsegraining where Dh = 3. If Dh < 3 then the approximation 
of the empirical distribution by a differentiable one is impossible. The same conclusion follows if 
v < Dh = 3. 

Fractal scaling of galaxies in an intermediate range 1 < r < 1000/i _1 Mpc, e.g. would still allow 
for the possibility of nonfractal matter distributions at the largest (or "cosmologic") scales. An 
empirical matter distribution P(x,y, z) that shows no clusters and voids "at large enough scales" 
l( n ) would require a support with dimension Dh = 3. Whether galaxies are distributed more or 
less uniformly over a nonuniform space-filling support {l x ,ily,ilz,i} is then a question of whether 
P% ~ lx,ily,ilz,i holds over enough generations of the hypothetical "cosmologic-scale support" so 
that one can define the derivatives of densities normally used in hydrodynamics. Only for a uniform 
support would this condition reduce to the requirement of statistical independence with equal 
probabilities, Pi ~ N~ . This is the requirement for large-scale uniformity (p(x, y, z) ~ constant) 
stated in the language of dynamical systems theory. 

A necessary condition for homogeneity in a given direction x, at cosmological scales ^> 
I000/i _1 Mpc, can also be stated as follows: on what scale l( n > of cosmologic coarsegraining can 
a staircase P(x) of N steps (P(x) denotes the empirical distribution P(x,y,z) with y and z 
held constant) be approximated by a differentiable distribution, P'(x) w p(x), where p(x) is 
smooth, approximately analytic? There are two requirements: the number n% of data points in 
each interval must be very large, and the spacing between points cannot be very different from 
Ax w 1/N. This is the same as saying that the steps in P(x) are nearly uniform and of very 
small width, and lie approximately on the straight line P(x) « x with slope p(x) rs I (uniform 
density). If the pointwise spacing is not exactly Ax w 1/N, but Dh = 3 and there are no voids and 
clustering, then the staircase may be a smooth deformation of the constant density distribution 
P(x) « x with variable but nearly smooth slope p{x) k, P'(x) approximating a nonuniform smooth 
density. The necessary and sufficient condition for large-scale homogeneity, stated in the language 
of statisticians is given in Stoyan et al. fnj. Contrary to the advice of part 9 in |1| we point 
out that "pencil beam surveys" generally cannot be treated as one-dimensional cuts. In order to 
qualify as a one dimensional cut, the maximum width of a pencil beam survey should be on the 
order of the size of a galaxy. 

If we think in terms of hydrodynamic models of clustering, then space-filling supports, by 
Liouville's theorem, require conservative dynamical systems. A smooth density at large scales 
cannot be the result of dissipative hydrodynamics. 

No empirical test can be performed globally on the scale of the universe. The best that one 
can hope for is to gain information about the different local distributions of matter for various 
different samples of galaxies and clusters of galaxies at scales r 3> I000ft- _1 Mpc and test them for 
homogeneity and isotropy (our Euclidean language is applicable locally in a curved space-time). 
If the scales required to exhibit evidence for homogeneity and isotropy should fall beyond the 
inherent limitations on all future observations, then the cosmological principle is not falsifiable 
and is only a matter of belief. 

A more useful question is how to combine fractal or multifractal distributions with hydrody- 
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namics, now denning the density 



N 

p(x) =2jmi<5(f - x"i) (44) 

i=l 

to include the masses of galaxies, as is done in part 6 of In a related publication |6^] it is 
asserted that ". . . phenomena in which intrinsic self-similar irregularities develop at all scales and 
fluctuations cannot be described in terms of analytic functions. The theoretical methods used to 
describe this situation could not be based on ordinary differential equations because self-similarity 
implies the absence of analyticity and the familiar mathematical physics becomes inapplicable." 
These claims are patently false: There is certainly no reason why one cannot study the dynamics 
of eq. ( p4| ) as an N-body problem using nonlinear differential equations, as was done by mathe- 
maticians from Laplace to Poincare and beyond! Why was such a sweeping statement made to 
begin with? Clearly, by the extrapolation length scales to zero, to the empirically and physically 
meaningless mathematical limit where, essentially, two galaxies occupy the same position. Only 
in this limit are fractal densities nonanalytic enough to be completely nondifferentiable. 



In fact the coarsegrained picture of empirical distributions formulated in parts l.t and |3.6| 
above leads in principle to a hydrodynamic description in terms of the usual differential equations 
of mathematical physics. At any desired resolution « /„, simply represent the density by 

i=l 

where (see eq. (|44j) above) pi is the coarsegrained mass density for a partitioning {li}. One can 
certainly study the stability of this distribution (taken as an initial condition) via the usual differ- 
ential equations of hydrodynamics, even if this may require solutions in the weak (or distribution) 
sense. The results will not be correct at length scales smaller than the scales {h}, but at smaller 
scales (down to / m j n > 0) one can increase N n (decreasing the size of intervals U) and again study 
stability questions via a finer grained density of the form of (|45|) . Uniform densities on space-filling 
supports in Newtonian cosmology are unstable if the universe is open, stable if the Euclidian man- 
ifold is a flat 3-torus ||. In other words, homogeneity is globally unstable in an open Newtonian 
universe. 



9 Platonic expectations? 

The standard model of cosmology ({nj, |6l| ) is a paradigm of simplicity: the simplest possible 
solution of Einstein's field equations (global integrability based upon global symmetry) is combined 
with what Feynman has called "the usual initial conditions of physics" : random or thermal initial 
conditions. Feynman pointed out that biologists, geologists, and astronomers know that the 
usual initial conditions of physics (and integrable dynamics, one must add) cannot be used to 
explain most of the phenomena that are observed in nature (j2^j). Cosmologically seen, far from 
equilibrium phenomena occur at relatively small scales. That these nonequilibrium nonuniformitics 
at small scales should be consistent with perfect symmetry and random initial conditions at the 
largest scales is not at all clear. If it would be true then, as Plato Q believed, the heavens 
would be perfect while all disorder is confined to the "sphere of the earth" (extended a bit, out to 
150 fr _1 Mpc, at least). 

It is not necessary to assume that the galaxy distribution requires the nineteenth century notion 
of randomness except perhaps as a sometimes convenient approximation to deterministic chaotic 
dynamics. We now understand how even the simplest chaotic dynamical systems can generate 
all possible historgams that can be constructed empirically (p^j). By randomness we mean a 
breakdown of the space-time description of cause and effect (as in quantum mechanics). Statistical 
independence (p8|]), in contrast, is a separate idea that occurs in deterministic dynamics. Physics 
in the last twenty years has begun to follow more the path laid out by Poincare (@), deviating 
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from the traditional path set down by Boltzmann (contrast the traditional emphasis on randomness 
found in and in |81| ), for example, with the perspective on randomness expressed in p(i|]). 

A deterministic dynamical system (like the differential equations that generate the character- 
istic curves of Newtonian or Einsteinian cosmology) can generate inhomogeneity that need not 
be fractal. A dynamical system far from thermodynamic equilibrium does not generate a unique 
probability distribution, but instead generates infinitely many different classes of distributions de- 
pending on classes of initial conditions. We would have no way to discover "the initial conditions 
of the universe" other than by accurate backward integrations in time starting from the present 
(empirically unknown) distribution of matter via the correct equations of motion. The laws of 
physics alone do not tell us anything about initial conditions (@|). In the far from equilibrium 
case, on small scales, we know that mother nature has not chosen "the usual initial conditions 
of physics" (trees don't grow from thermal equilibrium initial conditions, but arise instead from 
strong driving combined with dissipation). 

Even if we knew the present cosmologic-scale distribution of matter the question whether 
the global matter distribution could have been generated from a thermally equilibrated initial 
state cannot be answered by N-body simulations, because no existing computer can reproduce, 
in backward integration in time, even the first digits of those initial conditions after integrations 
forward in time over billions of years. Accurate backward integrations were actually accomplished, 
for unknown reasons and for a very restricted range of energies, by building a special computer 
to try to simulate the evolution of the solar system via a chaotic symplectic map over millions of 
years ([n|). One object was to try to understand the initial conditions that initially fascinated 
Keplerf, the so-called Titius "law" (see also Q). 

The characteristic curves of the partial differential equations of Newtonian cosmology are 
generated by a dynamical system with a phase space of at least six dimensions (three degrees of 
freedom). The Lagrangian method j^] studies characteristics via backward integration in time, 
using the fact that initial conditions are trivially conserved along streamlines. Chaotic dynamics 
requires only a three dimensional phase space. Complex dynamics, dynamics equivalent to a 
universal computer (no scaling laws, no generating partition, nothing to aid in forecasting the 
future statistically) may occur in certain Newtonian systems with only three degrees of freedom 
(HP). There, scaling laws, attractors, and the cardinality of strange sets can at best be defined 
only locally, if at all. It is unknown whether (any form of) fluid turbulence or the Newtonian 
three body problem fall into the complexity category. Expecting the universe to be describable 
by a completely integrable dynamical system, even at the largest scale of coarsegraining, seems 
unlikely in the light of what we now understand about deterministic dynamics. We are reminded 
that the cosmological principle is not demanded by any known laws of physics, and is not itself a 
separate law of nature. 

This article was written with the benefit of hindsight. The author belongs to the subset of 
physicists who, in the past, has claimed evidence for r(q) on the basis of log-log plots that did 
satisfy the Geilo Criterion. 
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